390 research outputs found
Centering in-the-large: Computing referential discourse segments
We specify an algorithm that builds up a hierarchy of referential discourse
segments from local centering data. The spatial extension and nesting of these
discourse segments constrain the reachability of potential antecedents of an
anaphoric expression beyond the local level of adjacent center pairs. Thus, the
centering model is scaled up to the level of the global referential structure
of discourse. An empirical evaluation of the algorithm is supplied.Comment: LaTeX, 8 page
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
We present BPEmb, a collection of pre-trained subword unit embeddings in 275
languages, based on Byte-Pair Encoding (BPE). In an evaluation using
fine-grained entity typing as testbed, BPEmb performs competitively, and for
some languages bet- ter than alternative subword approaches, while requiring
vastly fewer resources and no tokenization. BPEmb is available at
https://github.com/bheinzerling/bpem
Use Generalized Representations, But Do Not Forget Surface Features
Only a year ago, all state-of-the-art coreference resolvers were using an
extensive amount of surface features. Recently, there was a paradigm shift
towards using word embeddings and deep neural networks, where the use of
surface features is very limited. In this paper, we show that a simple SVM
model with surface features outperforms more complex neural models for
detecting anaphoric mentions. Our analysis suggests that using generalized
representations and surface features have different strength that should be
both taken into account for improving coreference resolution.Comment: CORBON workshop@EACL 201
Lexical Features in Coreference Resolution: To be Used With Caution
Lexical features are a major source of information in state-of-the-art
coreference resolvers. Lexical features implicitly model some of the linguistic
phenomena at a fine granularity level. They are especially useful for
representing the context of mentions. In this paper we investigate a drawback
of using many lexical features in state-of-the-art coreference resolvers. We
show that if coreference resolvers mainly rely on lexical features, they can
hardly generalize to unseen domains. Furthermore, we show that the current
coreference resolution evaluation is clearly flawed by only evaluating on a
specific split of a specific dataset in which there is a notable overlap
between the training, development and test sets.Comment: 6 pages, ACL 201
Incremental Centering and Center Ambiguity
In this paper, we present a model of anaphor resolution within the framework
of the centering model. The consideration of an incremental processing mode
introduces the need to manage structural ambiguity at the center level. Hence,
the centering framework is further refined to account for local and global
parsing ambiguities which propagate up to the level of center representations,
yielding moderately adapted data structures for the centering algorithm.Comment: 6 pages, uuencoded gzipped PS file (see also Technical Report at:
http://www.coling.uni-freiburg.de/public/papers/cogsci96-center.ps.gz
Investigating Multilingual Coreference Resolution by Universal Annotations
Multilingual coreference resolution (MCR) has been a long-standing and
challenging task. With the newly proposed multilingual coreference dataset,
CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by
using its harmonized universal morphosyntactic and coreference annotations.
First, we study coreference by examining the ground truth data at different
linguistic levels, namely mention, entity and document levels, and across
different genres, to gain insights into the characteristics of coreference
across multiple languages. Second, we perform an error analysis of the most
challenging cases that the SotA system fails to resolve in the CRAC 2022 shared
task using the universal annotations. Last, based on this analysis, we extract
features from universal morphosyntactic annotations and integrate these
features into a baseline system to assess their potential benefits for the MCR
task. Our results show that our best configuration of features improves the
baseline by 0.9% F1 score.Comment: Accepted at Findings of EMNLP202
- …